JAVA as a Basis for Parallel Data Mining in Workstation Clusters
نویسندگان
چکیده
The exploitation of hidden information from large datasets by means of data mining techniques su ers from long response times. We address this problem by using the processing power of workstation clusters and have studied the performance of OLAP queries as a rst step towards a portable data mining platform. The results of our study suggest that with the availability of parallel workstation clusters that are equipped with high performance communication networks, ne-grained and communication-intensive parallelizations of queries are promising { even though they are considered too costly in traditional database systems. The paper describes our Java framework for parallel OLAP-type query execution, necessary optimizations to the standard Java implementation, and analyzes the performance of non-standard parallel execution schemes on a workstation cluster.
منابع مشابه
Exploiting idle cycles to execute data mining applications on clusters of PCs
In this paper we present and evaluate Inhambu, a distributed object-oriented system that supports the execution of data mining applications on clusters of PCs and workstations. This system provides a resource management layer, built on the top of Java/RMI, that supports the execution of the data mining tool called Weka. We evaluate the performance of Inhambu by means of several experiments in h...
متن کاملData mining on PC cluster connected with storage area network: its preliminary experimental results
Personal computer/Workstation (PC/WS) clusters have become a hot research topic recently in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data inte...
متن کاملImplementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments
Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pil...
متن کاملPreliminary Experimental Results of a Parallel Association Rule Mining on ATM Connected PC Clusters
Until recently, workstations were overwhelmingly superior to personal computers in terms of performance. However, recent PC technology has dramatically increased its CPU, main memory, and cache memory performance. Therefore massively parallel computer systems are moving away from proprietary components such as CPU, disks, etc. to commodity parts. As far as applications are concerned, we believe...
متن کاملUsing Available Remote Memory Dynamically for Parallel Data Mining Application on ATM-Connected PC Cluster
Personal computer/Workstation (PC/WS) clusters are promising candidates for future high performance computers, because of their good scalability and cost performance ratio. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as conventional scientific calculations. Thus, investigating...
متن کامل